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Abstract — In many scientific disciplines structures in high- 
dimensional data have to be found, e.g., in stellar spectra, in 
genome data, or in face recognition tasks. In this work we 
present a novel approach to non-linear dimensionality reduction. 
It is based on fitting K-nearest neighbor regression to the unsu- 
pervised regression framework for learning of low-dimensional 
manifolds. Similar to related approaches that are mostly based on 
kernel methods, unsupervised K-nearest neighbor (UNN) regres- 
sion optimizes latent variables w.r.t. the data space reconstruction 
error employing the K-nearest neighbor heuristic. The problem 
of optimizing latent neighborhoods is difficult to solve, but the 
UNN formulation allows the design of efficient strategies that 
iteratively embed latent points to fixed neighborhood topologies. 
UNN is well appropriate for sorting of high-dimensional data. 
The iterative variants are analyzed experimentally. 

I. Introduction 

Dimensionality reduction and manifold learning have an im- 
portant part to play in the understanding of data. In this work 
we introduce two fast constructive heuristics for dimensional- 
ity reduction called unsupervised K-nearest neighbor regres- 
sion. Meinicke 1 8 1 proposed a general unsupervised regression 
framework for learning of low-dimensional manifolds. The 
idea is to reverse the regression formulation such that low- 
dimensional data samples in latent space optimally reconstruct 
high-dimensional output data. We take this framework as basis 
for an iterative approach that fits KNN to this unsupervised 
setting in a combinatorial variant. The manifold problem we 
consider is a mapping F : y ^ x corresponding to the 
dimensionality reduction for data points y G Y C M'^, and 
latent points x G Y C M^ with d > q. The problem is a hard 
optimization problem as the latent variables X are unknown. 

In Section|Il|we will review related dimensionality reduction 
approaches, and repeat KNN regression. Section III presents 
the concept of UNN regression, and two iterative strategies 
that are based on fixed latent space topologies. Conclusions 
are drawn in Section [iVl 

II. Related Work 

Many dimensionality reduction methods have been pro- 
posed, a very famous one is principal component analysis 
(PCA), which assumes linearity of the manifold |5|, |10|. An 
extension for learning of non-linear manifolds is kernel PCA 
iT2\ that projects the data into a Hilbert space. Further famous 
approaches for manifold learning are Isomap by Tenenbaum, 
Silva, and Langford ifTSl , locally linear embedding (LLE) by 
Roweis and Saul ifTTIl . and principal curves by Hastie and 
Stuetzle (1. 



A. Unsupervised Regression 

The work on unsupervised regression for dimensionality 
reduction starts with Meinicke |8|, who introduced the cor- 
responding algorithmic framework for the first time. In this 
line of research early work concentrated on non-parametric 
kernel density regression, i.e., the counterpart of the Nadaraya- 
Watson estimator |9| denoted as unsupervised kernel re- 
gression (UKR). Klanke and Ritter |6| introduced an op- 
timization scheme based on LLE, PCA, and leave-one-out 
cross-validation (LOO-CV) for UKR. Carreira-Perpiiian and 
Lu Q argue that training of non-parametric unsupervised 
regression approaches is quite expensive, i.e., 0{N^) in time, 
and 0{N'^) in memory. Parametric methods can accelerate 
learning, e.g., unsupervised regression based on radial basis 
function networks (RBFs) |13|, Gaussian processes |7|, and 
neural networks Iil4il . 

B. KNN Regression 

In the following, we give a short introduction to K-nearest 
neighbor regression that is basis of the UNN approach. The 
problem in regression is to predict output values y G M'^ to 
given input values x G M^ based on sets of N input-output 
examples ((xi,yi), . . . , (xAr,yAr)). The goal is to learn a 
function f : x ^ y known as regression function. We assume 
that a data set consisting of observed pairs (x^,yj G X x Y 
is given. For a novel pattern x^ KNN regression computes the 
mean of the function values of its K-nearest neighbors: 
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with set Mk{^') containing the indices of the i^-nearest 
neighbors of x^ The idea of KNN is based on the assumption 
of locality in data space: In local neighborhoods of x patterns 
are expected to have similar output values y (or class labels) 
to f (x). Consequently, for an unknown x' the label must be 
similar to the labels of the closest patterns, which is modeled 
by the average of the output value of the K nearest samples. 
KNN has been proven well in various applications, e.g., in 
detection of quasars in interstellar data sets [2J. 

III. Unsupervised KNN Regression 

In this section we introduce two iterative strategies for 
UNN regression based on minimization of the data space 
reconstruction error (DSRE) (H. 



A. Unsupervised Regression 

Let Y = (yi,...yAr) with y G M"^ be the matrix of 
high-dimensional patterns in data space. We seek for a low- 
dimensional representation, i.e., a matrix of latent points 
X = (xi, . . .xat), such that a regression function f applied 
to X „point-wise optimally reconstructs the pattern", i.e., we 
search for an X that minimizes 
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E{X) is called data space reconstruction error (DSRE). Latent 
points X define the low-dimensional representation. The re- 
gression function applied to the latent points should optimally 
reconstruct the high-dimensional patterns. 

B. UNN 

An UNN regression manifold is defined by variables x G 
X C M^ with the unsupervised formulation of an UNN 
regression manifold 
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Matrix X contains the latent points x that define the manifold, 
i.e., the low-dimensional representation of data Y. Parameter 
X is the location where the function is evaluated. An optimal 
UNN regression manifold minimizes the DSRE 



Eix) = j^\\\-(uNN{^;m%, 



with Frobenius norm 
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In other words: an optimal UNN manifold consists of low- 
dimensional points X that minimize the reconstruction of the 
data points Y w.r.t. KNN regression. Regularization in UNN 
regression may be not as important as regularization in other 
methods that fit into the unsupervised regression framework. 
For example, in UKR regularization means penalizing ex- 
tension in latent space with E{X)p = E{X) + A||X||, and 
weight A |6|. In KNN regression moving the low-dimensional 
data samples infinitely apart from each other does not have 
the same effect as long as we can still determine the K- 
nearest neighbors, but extension can be penalized to avoid 
redundant solutions. For practical purposes (limitation of size 
of numbers) it might be reasonable to restrict continuous KNN 
latents spaces, e.g., to x G [0,1]^. In the following section 
fixed latent space topologies are used that do not require 
further regularization. 

C. Iterative Strategy 1 

For KNN not the absolute positions of data samples in latent 
space are relevant, but the relative positions that define the 
neighborhood relations. This perspective reduces the problem 
to a combinatorial search for neighborhoods Afxi^ij X) with 
z = 1, . . . , A/" that can be solved by testing all combinations of 



K-element subsets of N elements, i.e., all (^) combinations. 
The problem is still difficult to solve, in particular for high 
dimensions. In the following, we introduce a combinatorial 
approach to UNN, and introduce two iterative local strategies. 

The idea of our first iterative strategy (UNN 1) is to itera- 
tively assign the data samples to a position in an existing latent 
space topology that leads to the lowest DSRE. We assume 
fixed neighborhood topologies with equidistant positions in 
latent space, and therefore restrict the optimization problem 
of Equation ([3]) to a search in a subset of latent space. 
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Fig. 1. UNN 1: illustration of embedding of a low-dimensional point to a 
fixed latent space topology w.r.t. the DSRE testing all TV + 1 positions. 



As a simple variant we consider the linear case of the latent 
variables arranged equidistantly on a line x G R. In this 
simplified case only the order of the elements is important. 
The first iterative strategy works as follows: 

1) Choose one element y G Y, 

2) test all TV + 1 intermediate positions of the TV embedded 
elements in latent space, 

3) choose the latent position with min E(X), and embed y, 

4) remove y from Y, and repeat from Step 1 until all 
elements have been embedded. 

Figure [ij illustrates the TV + 1 possible embeddings of a 
data sample into an existing order of points in latent space 
(yellow/bright circles). For example, the position of element 
X3 results in a lower DSRE with K = 2 than the position of 
X5, as the mean of the two nearest neighbors of X3 is closer 
to y than the mean of the two nearest neighbors of X5 . 

The complexity of UNN 1 can be described as follows. Each 
DSRE evaluation takes Kd computations. We assume that the 
K nearest neighbors are saved in a list during the embedding 
for each latent point x, so that the search for indices A/k(x, X) 
takes (9(1) time. The DSRE has to be computed for AT + 1 
positions, which takes (A/" + 1) • Kd steps, i.e., 0{N) time. 



D. Iterative Strategy 2 

The iterative approach introduced in the last section tests all 
intermediate positions of previously embedded latent points. 
We propose a second iterative variant (UNN 2) that only tests 
the neighbored intermediate positions in latent space of the 
nearest embedded point y* G Y in data space. The second 
iterative strategy works as follows: 

1) Choose one element y G Y, 

2) look for the nearest y* G Y that has already been em- 
bedded (w.r.t. distance measure like Euclidean distance), 

3) choose the latent position next to y* with min E{%) and 
embed y, 

4) remove y from Y, add y to Y, and repeat from Step 1 
until all elements have been embedded. 

Figure |2] illustrates the embedding of a 2-dimensional point 
y (yellow) left or right of the nearest point y* in data space. 
The position with the lowest DSRE is chosen. In comparison 
to UNN 1, N distance comparisons in data space have to be 
computed, but only 2 positions have to be tested w.r.t. the 
data space reconstruction error. UNN 2 computes the nearest 
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1) 2D-S: First, we compare UNN 1 and UNN 2 on a simple 
2-dimensional data set, i.e., the 2-dimensional noisy S with 
N = 200 (2D-S'). Figure |3] shows the experimental results 
with K = 5 nearest neighbors. Similar colors correspond to 
neighbored latent points. Part (a) shows an UNN 1 embedding 
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(a) (b) 

Fig. 3. (a) UNN 1, and (b) UNN 2 embedding with K = 5 on 2D-S. 

of the 2D-S data set. Part (b) shows the embedding of the same 
data set with UNN 2. The colors of both embeddings show a 
satisfying topological sorting, although we can observe local 
optima. 

2) 3D-S: In the following, we will test UNN regression 
on a 3-dimensional S data set (SD-S*). The variant without 
a hole consists of 500 data points, the variant with a hole 
in the middle consists of 400 points. Figure |4] (a) shows the 
order of elements of the 3D-S data set without a hole at 
the beginning. The corresponding embedding with UNN 1 
and K = 10 is shown in Part (b) of the figure. Again, 
similar colors correspond to neighbored points in latent space. 
Part (c) of Figure [4] shows the UNN 2 embedding achieving 
similar results. Also on the UNN embedding of the S data set 
with hole, see Part (d) of the figure, a reasonable neighbored 
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Fig. 2. UNN 2: testing only the neighbored positions of the nearest point 
y* in data space. 



embedded point y* for each data point, which takes Nd steps. 
Only for the two neighbors the DSRE has to be computed, 
resulting in an overall number of Nd-\-2Kd steps, i.e., it takes 
0{N) time. Because of the multiplicative constants, UNN 2 is 
faster in practice. For example, for TV = 1, 000, K = 10, and 
d = 100, UNN 1 takes 1,001,000 steps, while UNN 2 takes 
102,000 steps. Testing all combinations takes (^^q^) steps, 
which is not computable in reasonable time. The following 
experimental section will answer the question, if this speedup 
of UNN 2 has to be paid with worse DSREs. 



E. Experiments 

This section shows the behavior of the iterative strategies 
on three test problems. We will compare the DSRE of both 
strategies to the initial DSRE at the end of this section. 
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(c) 



(d) 



Fig. 4. Results of UNN on 3D-S: (a) the unsorted S at the beginning, (b) 
the embedded S with UNN 1 md K = 10, (c) the embedded S with UNN 2 
and K = 10, and (d) a variant of S with a hole embedded with UNN 2. 



assignments can be observed. Quantitative results for the 
DSRE are reported in Table [l| 

3) USPS Digits: Last, we experimentally test UNN regres- 
sion on test problems from the USPS digits data set IH. For 
this sake we take 100 data samples of 256-dimensional (16 
X 16 pixels) pictures of handwritten digits of 2's and 5's. 
We embed a one-dimensional manifold, and show the high- 
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(b) 

Fig. 5. UNN 2 embeddings of USPPS digits: (a) 2's, and (b) 5's. Digits are 
shown that are assigned to every 14th embedded latent point. Similar digits 
are neighbored in latent space. 



dimensional data that is assigned to every 14th latent point, 
i.e., neighbored digits in the plot are neighbored in latent 
space. Figure [5] shows the result of the UNN 2-embedding 
for 2's and 5's with i^ = 10. We can observe that neighbored 
digits are similar to each other, while digits that are dissimilar 
are further away from each other in latent space. 

4) DSRE Comparison: Last, we compare the DSRE 
achieved by both strategies with the initial DSRE, and the 
DSRE achieved by LLE on all test problems. For the USPS 
digits data set we choose the number 7. Table [I| shows the 
experimental results of three settings for the neighborhood 
size K. The lowest DSRE on each problem is highlighted 
with bold figures. After application of the iterative strategies 
the DSRE is significantly lower than initially. Increasing K 
results in higher DSREs. With exception of LLE with K = 10 
on 2D-S, the UNN 1 strategy always achieves the best results. 
UNN 1 achieves lower DSREs than UNN 2, with exception of 
2D-5', and K = 10. The win in accuracy has to be paid with 



TABLE I 
Comparison of DSRE for initial data set, and after embedding 

WITH STRATEGY UNN 1, AND UNN 2. 







2D-5' 






3D-5' 




K 


2 


5 


10 


2 


5 


10 


init 


201.6 


290.0 


309.2 


691.3 


904.5 


945.80 


UNN 1 


19.6 


27.1 


66.3 


101.9 


126.7 


263.39 


UNN 2 


29.2 


70.1 


64.7 


140.4 


244.4 


296.5 


LLE 


25.5 


37.7 


40.6 


135.0 


514.3 


583.6 






3D-5' hole 






digits (7) 




K 


2 


5 


10 


2 


5 


10 


init 


577.0 


727.6 


810.7 


196.6 


248.2 


265.2 


UNN 1 


80.7 


108.1 


216.4 


139.0 


179.3 


216.6 


UNN 2 


101.8 


204.4 


346.8 


145.3 


195.4 


222.1 


LLE 


94.9 


198.9 


387.4 


147.8 


198.1 


217.8 



a constant runtime factor that may play an important role in 
case of large data sets, or high data space dimensions. 

IV. Conclusions 

With UNN regression we have fitted a fast regression 
technique into the unsupervised setting for dimensionality re- 
duction. The two iterative UNN strategies are efficient methods 
to embed high-dimensional data into fixed one-dimensional 
latent space taking 0{N) time. The speedup is achieved 
by restricting the number of possible solutions (reduction of 
solution space), and applying fast iterative heuristics. Both 
methods turned out to be performant on test problems in 
first experimental analyses. UNN 1 achieves lower DSREs, 
but UNN 2 is slightly faster because of the multiplicative 
constants of UNN L Our future work will concentrate on 
the analysis of local optima the UNN strategies approximate, 
and how the approach can be extended to guarantee global 
optimal solutions. Furthermore, the UNN strategies can be 
extended to latent topologies with higher dimensionality. For 
g = 2 the insertion of intermediate solutions into a grid is more 
difficult: it results in shifting rows and columns of the grid, 
and thus changes the latent topology in parts that may not be 
desired. A simple stochastic search strategy can be employed 
that randomly swaps positions of latent points in the grid. 
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